{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table align=\"left\">\n",
    "  <td>\n",
    "    <a href=\"https://colab.research.google.com/github/Nyandwi/machine_learning_complete/blob/main/2_data_manipulation_with_pandas/2_data_manipulation_with_pandas.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "  </td>\n",
    "  <td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*This notebook was created by [Jean de Dieu Nyandwi](https://twitter.com/jeande_d) for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), [Twitter](https://twitter.com/jeande_d), or [LinkedIn](https://linkedin.com/in/nyandwi).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ErLffIn4mD93"
   },
   "source": [
    "<a name='0'></a>\n",
    "# Data Manipulation with Pandas\n",
    "\n",
    " In this lab, you will learn how to manipulate data with Pandas. Here is an overview:\n",
    "\n",
    "* [1. Basics of Pandas for data manipulation:](#1)\n",
    "\n",
    "     * [A. Series and DataFrames](#1-1)\n",
    "     * [B. Data Indexing and Selection, and Iteration](#1-2)\n",
    "     * [C. Dealing with Missing data](#1-3)\n",
    "     * [D. Basic operations and Functions](#1-4)\n",
    "     * [E. Aggregation Methods](#1-5)\n",
    "     * [F. Groupby](#1-6)\n",
    "     * [G. Merging, Joining and Concatenate](#1-7)\n",
    "     * [H. Beyond Dataframes: Working with CSV, and Excel](#1-8)\n",
    "     \n",
    "* [2. Real World Exploratory Data Analysis (EDA)](#2)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6cK48ySE4SB0"
   },
   "source": [
    "<a name='1'></a>\n",
    "## 1. Basics of Pandas for data manipulation\n",
    "\n",
    "<a name='1-1'></a>\n",
    "### A. Series and DataFrames\n",
    "\n",
    "Both series and DataFrames are Pandas Data structures. \n",
    "\n",
    "Series is like one dimensional NumPy array with axis labels.\n",
    "\n",
    "DataFrame is multidimensional NumPy array with labels on rows and columns. \n",
    "\n",
    "Working with NumPy, we saw that it supports numeric type data. Pandas on other hand supports whole range of data types, from numeric to strings, etc.."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6FckciW165U4"
   },
   "source": [
    "Since we are using python notebook, we do not need to install Pandas. We only just have to import it. \n",
    "\n",
    "```\n",
    "import pandas as pd\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "hl7wXbFo4ZLP"
   },
   "outputs": [],
   "source": [
    "# importing numpy and pandas\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "r1MZe-Th7jf-"
   },
   "source": [
    "#### Creating Series\n",
    "\n",
    "Series can be created from a Python list, dictionary, and NumPy array. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Ous_6XUS7lLa",
    "outputId": "83ada779-3f99-4536-8958-abb525d3409b"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "3    4\n",
       "4    5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating the series from a Python list\n",
    "\n",
    "num_list = [1,2,3,4,5]\n",
    "\n",
    "pd.Series(num_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "E6aPp0JJ9MI9",
    "outputId": "0e2c3111-226d-4c5f-b2c2-fec67493c18a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a     Mon\n",
       "b    Tues\n",
       "c     Wed\n",
       "d    Thur\n",
       "e     Fri\n",
       "dtype: object"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "week_days = ['Mon','Tues','Wed','Thur','Fri']\n",
    "\n",
    "pd.Series(week_days, index=[\"a\", \"b\", \"c\", \"d\", \"e\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0gtsxrby9fHM"
   },
   "source": [
    "Note the data types `int64` and `object`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "MHWhzhTt9mev",
    "outputId": "1093b1e6-cdc8-41e2-d8d0-dbbe81d041bd"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1      United States\n",
       "91             India\n",
       "49           Germany\n",
       "86             China\n",
       "250           Rwanda\n",
       "dtype: object"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating the Series from dictionary \n",
    "\n",
    "countries_code = { 1:\"United States\",\n",
    "                 91:\"India\",\n",
    "                 49:\"Germany\",\n",
    "                 86:\"China\",\n",
    "                250:\"Rwanda\"}\n",
    "\n",
    "pd.Series(countries_code)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "hCBCCRHZ-i_1",
    "outputId": "420e17e5-3dcb-48d8-8293-5bda93306c4d"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    a\n",
       "2    b\n",
       "3    c\n",
       "4    d\n",
       "dtype: object"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = {1:'a', 2:'b', 3:'c', 4:'d'}\n",
    "pd.Series(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "lr0P6iSO_Ngl",
    "outputId": "69df04f1-e231-4d16-9917-cf04461a11c3"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "3    4\n",
       "4    5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating the Series from NumPy array\n",
    "# We peovide the list of indexes\n",
    "# if we don't provide the indexes, the default indexes are numbers...starts from 0,1,2..\n",
    "\n",
    "arr = np.array ([1, 2, 3, 4, 5])\n",
    "pd.Series(arr)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "vBV3WXje__5m",
    "outputId": "08444786-b5d7-454e-bb4f-3e279d8a898a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "d    4\n",
       "e    5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(arr, index=['a', 'b', 'c', 'd', 'e'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YYpqI7l4AvOG"
   },
   "source": [
    "#### Creating DataFrames\n",
    "\n",
    "DataFrames are the most used Pandas data structure. It can be created from a dictionary, 2D array, and Series. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "cThhEpprAyLr",
    "outputId": "9f09b720-0a54-451d-9005-e70c91479031"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "0     USA      1\n",
       "1   India     91\n",
       "2  German     49\n",
       "3  Rwanda    250"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating DataFrame from a dictionary\n",
    "\n",
    "countries = {'Name': ['USA', 'India', 'German', 'Rwanda'], \n",
    "             \n",
    "             'Codes':[1, 91, 49, 250] }\n",
    "\n",
    "pd.DataFrame(countries)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "fwdwsY9uGdG0",
    "outputId": "945f8126-a85b-46f5-9cd8-f649c7301fcf"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0         1         2         3\n",
       "1         4         5         6\n",
       "2         7         8         9"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating a dataframe from a 2D array\n",
    "# You pass the list of columns\n",
    "\n",
    "array_2d = np.array ([[1,2,3], [4,5,6], [7,8,9]])\n",
    "\n",
    "pd.DataFrame(array_2d, columns = ['column 1', 'column 2', 'column 3'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "id": "Y7ezd5g4HrHq",
    "outputId": "34eeaa06-92f3-40d9-c0f7-998d43f39931"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>China</th>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rwanda</th>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Codes\n",
       "United States      1\n",
       "India             91\n",
       "Germany           49\n",
       "China             86\n",
       "Rwanda           250"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating a dataframe from Pandas series \n",
    "# Pass the columns in a list\n",
    "\n",
    "countries_code = { \"United States\": 1,\n",
    "                 \"India\": 91,\n",
    "                 \"Germany\": 49,\n",
    "                 \"China\": 86,\n",
    "                 \"Rwanda\":250}\n",
    "\n",
    "pd_series = pd.Series (countries_code)\n",
    "\n",
    "pd.Series(countries_code)\n",
    "\n",
    "df = pd.DataFrame(pd_series, columns = ['Codes'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "id": "F1JPrkTizQHM",
    "outputId": "bc3d53bc-19d8-402c-fd8d-78ade1623a90"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Codes</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>1</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>91</td>\n",
       "      <td>450</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>49</td>\n",
       "      <td>575</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>China</th>\n",
       "      <td>86</td>\n",
       "      <td>5885</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rwanda</th>\n",
       "      <td>250</td>\n",
       "      <td>533</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Codes  Population\n",
       "United States      1         100\n",
       "India             91         450\n",
       "Germany           49         575\n",
       "China             86        5885\n",
       "Rwanda           250         533"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Adding a column\n",
    "# Number in population are pretty random\n",
    "\n",
    "df ['Population'] = [100, 450, 575, 5885, 533]\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "id": "pympKqSv8ZNp",
    "outputId": "bf04e8f8-7d0e-474e-d287-95a326128461"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>China</th>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rwanda</th>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Codes\n",
       "United States      1\n",
       "India             91\n",
       "Germany           49\n",
       "China             86\n",
       "Rwanda           250"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Removing a column \n",
    "\n",
    "df.drop('Population', axis =1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Hz2u0p008lDR",
    "outputId": "927211e6-3536-43ec-9932-54b5298a45d6"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Codes', 'Population'], dtype='object')"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "x0mYfsLt8oyC",
    "outputId": "033adeef-b0a6-407d-e58b-b293bc8c22d8"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<bound method NDFrame.keys of                Codes  Population\n",
       "United States      1         100\n",
       "India             91         450\n",
       "Germany           49         575\n",
       "China             86        5885\n",
       "Rwanda           250         533>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "DcGWsgwP8tuo",
    "outputId": "d9ab7016-b67e-4b59-f959-60d542ca7901"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['United States', 'India', 'Germany', 'China', 'Rwanda'], dtype='object')"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8S1rS2IqI5OQ"
   },
   "source": [
    "<a name='1-2'></a>\n",
    "### B. Data Indexing, Selection and Iteration\n",
    "\n",
    "Indexing and selection works in both Series and Dataframe.\n",
    "\n",
    "Because DataFrame is made of Series, let's focus on how to select data in DataFrame. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "xO-xCw1uI9Yt",
    "outputId": "008a4242-556f-42af-c946-55ff41f0cbe3"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "a     USA      1\n",
       "b   India     91\n",
       "c  German     49\n",
       "d  Rwanda    250"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating DataFrame from a dictionary\n",
    "\n",
    "countries = {'Name': ['USA', 'India', 'German', 'Rwanda'], \n",
    "             \n",
    "             'Codes':[1, 91, 49, 250] }\n",
    "\n",
    "df = pd.DataFrame(countries, index=['a', 'b', 'c', 'd'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "BhcNuJo1sBff",
    "outputId": "a89548bf-e083-496e-d7a2-93bc7378bce3"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a       USA\n",
       "b     India\n",
       "c    German\n",
       "d    Rwanda\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "oBhhzCWyy_OJ",
    "outputId": "22207d29-0349-4a06-cb29-902bc389eb06"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a       USA\n",
       "b     India\n",
       "c    German\n",
       "d    Rwanda\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "vRUbK_YLsQ0h",
    "outputId": "517f0652-6c83-4293-9b84-236263e6f856"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a      1\n",
       "b     91\n",
       "c     49\n",
       "d    250\n",
       "Name: Codes, dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df ['Codes']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "BGdQngaxsTbM",
    "outputId": "ce70d45c-1b3d-400b-f278-f986920ec16f"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "a     USA      1\n",
       "b   India     91\n",
       "c  German     49\n",
       "d  Rwanda    250"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## When you have many columns, columns in list will be selected\n",
    "\n",
    "df [['Name', 'Codes']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 111
    },
    "id": "hm5NIMGksb71",
    "outputId": "9333881c-eb66-4543-908a-a5b676d82d8c"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Name  Codes\n",
       "a    USA      1\n",
       "b  India     91"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# This will return the first two rows\n",
    "df [0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7hsBOkVLwev1"
   },
   "source": [
    "You can also use `loc` to select data by the label indexes and `iloc` to select by default integer index (or by the position of the row)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "r0kBMm4Dwmyw",
    "outputId": "5cca5a84-43be-43ee-948f-a05f706dbdc1"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name     USA\n",
       "Codes      1\n",
       "Name: a, dtype: object"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['a']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "-S2cSr5Ax97c",
    "outputId": "efb0c4e8-7fa0-4fb7-9a12-ea2851ecd890"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "b   India     91\n",
       "c  German     49\n",
       "d  Rwanda    250"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['b':'d']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 111
    },
    "id": "RictqM1hyUYm",
    "outputId": "0d962413-d6c5-4cdd-800a-3eb99f98b597"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Name  Codes\n",
       "a    USA      1\n",
       "b  India     91"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df [:'b']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "m2Tauaenw6WQ",
    "outputId": "4cf58b8a-15e3-47a4-d849-cc4cde79c139"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name     German\n",
       "Codes        49\n",
       "Name: c, dtype: object"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 111
    },
    "id": "e8YeUoXZx4KA",
    "outputId": "e7b5c924-6755-44b7-8405-4cb8198113bf"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "b   India     91\n",
       "c  German     49"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[1:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 111
    },
    "id": "wGaDmJRGyYhy",
    "outputId": "ba0e9cbb-95d5-4fb3-b799-90ebd3c296f6"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "c  German     49\n",
       "d  Rwanda    250"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[2:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RCc2oXwszDDB"
   },
   "source": [
    "### Conditional Selection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "iNxAW-NTzHAS",
    "outputId": "d66e4c9a-dfbc-486a-d7ae-d27487ed5722"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "a     USA      1\n",
       "b   India     91\n",
       "c  German     49\n",
       "d  Rwanda    250"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "id": "4F3mr2Fszagx",
    "outputId": "878b4c87-ae97-47eb-df25-bb98ee575056"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "c  German     49"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Let's select a country with code 49\n",
    "\n",
    "df [df['Codes'] ==49 ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "xLSnNlqc0XMp",
    "outputId": "d006ddff-24e3-409e-aa5c-57522194fceb"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>German</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Codes\n",
       "a     USA      1\n",
       "b   India     91\n",
       "c  German     49"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df [df['Codes'] < 250 ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "id": "_3g-C-UKzyq6",
    "outputId": "a8eecb7b-b35c-4de9-cabb-86e548183318"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>USA</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Name  Codes\n",
       "a  USA      1"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df [df['Name'] =='USA' ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "id": "EqqEQfhoz9IG",
    "outputId": "ac0c9c58-53fa-4228-d5f7-2742199a7471"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>India</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Name  Codes\n",
       "b  India     91"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You can use and (&) or (|) for more than conditions\n",
    "#df [(condition 1) & (condition 2)]\n",
    "\n",
    "df [(df['Codes'] == 91 ) & (df['Name'] == 'India') ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ipe0NTjf2Db8"
   },
   "source": [
    "You can also use `isin()` and `where()` to select data in a series or dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "CJIqyvOC2P5A",
    "outputId": "8cf04e2e-40a4-4518-d8d7-2b0e8e5149cf"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Name  Codes\n",
       "a   True   True\n",
       "b   True  False\n",
       "c  False  False\n",
       "d  False   True"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# isin() return false or true when provided value is included in dataframe\n",
    "sample_codes_names=[1,3,250, 'USA', 'India', 'England']\n",
    "\n",
    "df.isin(sample_codes_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HsqhvWzo3B2N"
   },
   "source": [
    "As you can see, it returned `True` wherever a country code or name was found. Otherwise, `False`. You can use a dictinary to match search by columns. A key must be a column and values are passed in list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "fnx_IAnS3Wq8",
    "outputId": "e4f51707-8352-4aec-9e12-f2937a4804aa"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Codes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Name  Codes\n",
       "a   True   True\n",
       "b   True  False\n",
       "c  False  False\n",
       "d  False   True"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample_codes_names = {'Codes':[1,3,250], 'Name':['USA', 'India', 'England']}\n",
    "\n",
    "df.isin(sample_codes_names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "PGv1D9K54knp",
    "outputId": "48868abb-3dcc-4344-db4d-5e9c63f1a1da"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0         1         2         3\n",
       "1         4         5         6\n",
       "2         7         8         9"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.DataFrame(np.array ([[1,2,3], [4,5,6], [7,8,9]]), \n",
    "                   columns = ['column 1', 'column 2', 'column 3'])\n",
    "\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "BGU1CUQ142CW",
    "outputId": "9caa3f57-cfd5-423e-eba1-a9fdb13866cd"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0     False     False      True\n",
       "1      True      True     False\n",
       "2      True     False     False"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.isin([0,3,4,5,7])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "NsFfB57Y5ePv",
    "outputId": "ced02843-254f-40f5-af21-1a005677ccb5"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       NaN       NaN       NaN\n",
       "1       NaN       5.0       6.0\n",
       "2       7.0       8.0       9.0"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 [df2 > 4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "HTTed_RS5E9S",
    "outputId": "fe5b3755-85de-41de-87a9-3a4f190a2bc9"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       NaN       NaN       NaN\n",
       "1       NaN       5.0       6.0\n",
       "2       7.0       8.0       9.0"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.where(df2 > 4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4cGg8isa5waQ"
   },
   "source": [
    "`where` allows you to replace the values that doesn't meet the provided condition with any other value. So, if we do `df2.where(df2 > 4, 0)` as follows, all values less than `4` will be replaced by `0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "jcIdwzbI5LBd",
    "outputId": "4e043655-b16a-40ac-bb54-ad1365cf3815"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0         0         0         0\n",
       "1         0         5         6\n",
       "2         7         8         9"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.where(df2 > 4, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, we can achieve the above by..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "sJdjZySr5Uaz",
    "outputId": "476cd0a7-4228-4812-c65e-022f099c6cb4",
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0         0         0         0\n",
       "1         0         5         6\n",
       "2         7         8         9"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 [df2 <= 4] = 0\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sT3tBawQHIxh"
   },
   "source": [
    "### Iteration\n",
    "\n",
    "```\n",
    "df.items() #Iterate over (column name, Series) pairs.\n",
    "df.iteritems() Iterate over (column name, Series) pairs.\n",
    "DataFrame.iterrows() Iterate over DataFrame rows as (index, Series) pairs.\n",
    "DataFrame.itertuples([index, name]) Iterate over DataFrame rows as namedtuples.\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "column 1\n",
      "0    0\n",
      "1    0\n",
      "2    7\n",
      "Name: column 1, dtype: int64\n",
      "column 2\n",
      "0    0\n",
      "1    5\n",
      "2    8\n",
      "Name: column 2, dtype: int64\n",
      "column 3\n",
      "0    0\n",
      "1    6\n",
      "2    9\n",
      "Name: column 3, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# Iterate over (column name, Series) pairs.\n",
    "\n",
    "for col_name, content in df2.items():\n",
    "    print(col_name)\n",
    "    print(content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "column 1\n",
      "0    0\n",
      "1    0\n",
      "2    7\n",
      "Name: column 1, dtype: int64\n",
      "column 2\n",
      "0    0\n",
      "1    5\n",
      "2    8\n",
      "Name: column 2, dtype: int64\n",
      "column 3\n",
      "0    0\n",
      "1    6\n",
      "2    9\n",
      "Name: column 3, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# Iterate over (column name, Series) pairs.\n",
    "# Same as df.items()\n",
    "\n",
    "for col_name, content in df2.iteritems():\n",
    "    print(col_name)\n",
    "    print(content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(0, column 1    0\n",
      "column 2    0\n",
      "column 3    0\n",
      "Name: 0, dtype: int64)\n",
      "(1, column 1    0\n",
      "column 2    5\n",
      "column 3    6\n",
      "Name: 1, dtype: int64)\n",
      "(2, column 1    7\n",
      "column 2    8\n",
      "column 3    9\n",
      "Name: 2, dtype: int64)\n"
     ]
    }
   ],
   "source": [
    "# Iterate over DataFrame rows as (index, Series) pairs\n",
    "\n",
    "for row in df2.iterrows():\n",
    "    print(row)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pandas(Index=0, _1=0, _2=0, _3=0)\n",
      "Pandas(Index=1, _1=0, _2=5, _3=6)\n",
      "Pandas(Index=2, _1=7, _2=8, _3=9)\n"
     ]
    }
   ],
   "source": [
    "# Iterate over DataFrame rows as namedtuples\n",
    "\n",
    "for row in df2.itertuples():\n",
    "    print(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notes: Thanks to [Prit Kalariya](https://twitter.com/pritkalariya) for Contributing the Iteration part!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5L0C1_d5840L"
   },
   "source": [
    "<a name='1-3'></a>\n",
    "### C. Dealing with Missing data\n",
    "\n",
    "Real world datasets are messy, often with missing values. Pandas replace NaN with missing values by default. NaN stands for not a number. \n",
    "\n",
    "Missing values can either be ignored, droped or filled. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "xWN2SE4K87Cn",
    "outputId": "8b639662-3a57-4d62-95d0-a1447c9c2bf0"
   },
   "outputs": [],
   "source": [
    "# Creating a dataframe\n",
    "\n",
    "df3 = pd.DataFrame(np.array ([[1,2,3], [4,np.nan,6], [7,np.nan,np.nan]]), \n",
    "                   columns = ['column 1', 'column 2', 'column 3'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gofEv_rhTxzz"
   },
   "source": [
    "#### Checking Missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "EQP7y0YaN5Jb",
    "outputId": "ce985690-efa5-4ad2-9258-91a0f3fd4141"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0     False     False     False\n",
       "1     False      True     False\n",
       "2     False      True      True"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Recognizing the missing values\n",
    "\n",
    "df3.isnull()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "BQSUMOXeOe5Z",
    "outputId": "9233ea74-868a-4f26-b275-96b3f8f46dbd"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "column 1    0\n",
       "column 2    2\n",
       "column 3    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Calculating number of the missing values in each feature\n",
    "\n",
    "df3.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "je6Rqhq0Opgg",
    "outputId": "967f225a-6b2f-494a-8e16-eb0ab558e622"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0      True      True      True\n",
       "1      True     False      True\n",
       "2      True     False     False"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Recognizng non missig values\n",
    "\n",
    "df3.notna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "9oJFnRwNQkMb",
    "outputId": "06373d27-24f4-4be3-e665-c3a055300b1d"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "column 1    3\n",
       "column 2    1\n",
       "column 3    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3.notna().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OrrsYADLT2J7"
   },
   "source": [
    "#### Removing the missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "id": "BtYAGZ3GQm7-",
    "outputId": "c1ef0bc6-09d0-4063-db4b-f6092fc30cc8"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       1.0       2.0       3.0"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Dropping missing values \n",
    "\n",
    "df3.dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "M03nac6CRfWQ"
   },
   "source": [
    "All rows are deleted because dropna() will remove each row which have missing value. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "6qdjI9S9Rfvk",
    "outputId": "b469ed24-0d8e-4ad6-f766-229e4151e06e"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    3.0\n",
       "1    6.0\n",
       "Name: column 3, dtype: float64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can drop NaNs in specific column(s)\n",
    "\n",
    "df3['column 3'].dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "EWfHlBf7R0b1",
    "outputId": "ddbbf4c1-0eca-4a4c-b38a-654c3d79da66"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1\n",
       "0       1.0\n",
       "1       4.0\n",
       "2       7.0"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You can drop data by axis \n",
    "# Axis = 1...drop all columns with Nans\n",
    "# df3.dropna(axis='columns')\n",
    "\n",
    "df3.dropna(axis=1)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "id": "wTFG5MeBTG9y",
    "outputId": "dd8f4d08-5d82-498f-c3a3-fc35f6a8c8ba"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       1.0       2.0       3.0"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# axis = 0...drop all rows with Nans\n",
    "# df3.dropna(axis='rows') is same \n",
    "\n",
    "df3.dropna(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "daRUQLvNT7xT"
   },
   "source": [
    "#### Filling the missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "keazo2zoTKIm",
    "outputId": "0d3ebd6f-497e-4ad1-d00f-723e8a0f1386"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       1.0       2.0       3.0\n",
       "1       4.0      10.0       6.0\n",
       "2       7.0      10.0      10.0"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filling Missing values\n",
    "\n",
    "df3.fillna(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "WI6t4b68V65Z",
    "outputId": "a2d6b6f3-9d6d-4e7b-a8e5-084b7d634ca7"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "      <td>fillme</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "      <td>fillme</td>\n",
       "      <td>fillme</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1 column 2 column 3\n",
       "0       1.0      2.0      3.0\n",
       "1       4.0   fillme      6.0\n",
       "2       7.0   fillme   fillme"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "df3.fillna('fillme')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "ZBFjHR0JWEzK",
    "outputId": "98d074dd-2060-4094-f5bb-f9d2b8b651d3"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       1.0       2.0       3.0\n",
       "1       4.0       2.0       6.0\n",
       "2       7.0       2.0       6.0"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You can forward fill (ffill) or backward fill(bfill)\n",
    "# Or fill a current value with previous or next value\n",
    "\n",
    "df3.fillna(method='ffill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "azOA4S_PWB-8",
    "outputId": "b9a3a1fb-0cb2-4160-85de-5ef2df307c37"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       1.0       2.0       3.0\n",
       "1       4.0       NaN       6.0\n",
       "2       7.0       NaN       NaN"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Won't change it because the last values are NaNs, so it backward it\n",
    "\n",
    "df3.fillna(method='bfill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "uVkg9e9AWYvK",
    "outputId": "14ff8c58-6ac7-4a6e-e5f9-bccd7ebc1592"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "      <th>column 3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2  column 3\n",
       "0       1.0       2.0       3.0\n",
       "1       4.0       6.0       6.0\n",
       "2       7.0       NaN       NaN"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# If we change the axis to columns, you can see that Nans at row 2 and col 2 is backfilled with 6\n",
    "\n",
    "df3.fillna(method='bfill', axis='columns')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "k3MilbA7XSPo"
   },
   "source": [
    "<a name='1-4'></a>\n",
    "### D. More Operations and Functions\n",
    "\n",
    "This section will show the more and most useful functions of Pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "id": "NWubkTA6XUpl"
   },
   "outputs": [],
   "source": [
    "df4 = pd.DataFrame({'Product Name':['Shirt','Boot','Bag'], \n",
    "              'Order Number':[45,56,64], \n",
    "              'Total Quantity':[10,5,9]}, \n",
    "              columns = ['Product Name', 'Order Number', 'Total Quantity'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uHXUH7KdH-Mp"
   },
   "source": [
    "#### Retrieving basic info about the Dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "TziPr2P2IFNG",
    "outputId": "8d7526ca-9b60-48d7-d292-9c8158921e53"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 3 columns):\n",
      " #   Column          Non-Null Count  Dtype \n",
      "---  ------          --------------  ----- \n",
      " 0   Product Name    3 non-null      object\n",
      " 1   Order Number    3 non-null      int64 \n",
      " 2   Total Quantity  3 non-null      int64 \n",
      "dtypes: int64(2), object(1)\n",
      "memory usage: 200.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "# Return a summary about the dataframe\n",
    "\n",
    "df4.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "39diMDz1IQyv",
    "outputId": "08be59cf-6bd2-4386-d0a5-454c423d072d"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Product Name', 'Order Number', 'Total Quantity'], dtype='object')"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return dataframe columns\n",
    "\n",
    "df4.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-BESb6FxIk32",
    "outputId": "46ecc116-190d-49a0-be2c-54d966e64960"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<bound method NDFrame.keys of   Product Name  Order Number  Total Quantity\n",
       "0        Shirt            45              10\n",
       "1         Boot            56               5\n",
       "2          Bag            64               9>"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return dataframe data\n",
    "\n",
    "df4.keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "id": "k5v_SKHlIn2C",
    "outputId": "6cfdcec6-76ab-42ff-de77-2d514a70ba63"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Shirt</td>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Product Name  Order Number  Total Quantity\n",
       "0        Shirt            45              10"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the head of the dataframe ....could make sense if you have long frame\n",
    "# Choose how many rows you want in head()\n",
    "\n",
    "df4.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "id": "S5ZiB2wqI3rs",
    "outputId": "a95b716c-5a9b-4fb4-c0b3-8b0812cc0dce"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Bag</td>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Product Name  Order Number  Total Quantity\n",
       "2          Bag            64               9"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the tail of the dataframe\n",
    "\n",
    "df4.tail(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "IuVfuOuXK6KW",
    "outputId": "f31732e5-ad72-4c34-c5a9-2808b47e3b40"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([['Shirt', 45, 10],\n",
       "       ['Boot', 56, 5],\n",
       "       ['Bag', 64, 9]], dtype=object)"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return NumPy array of the dataframe\n",
    "\n",
    "df4.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "GMSw7p-3LAvn",
    "outputId": "b30e41d5-efe3-47d0-8aae-a7c23771850f"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the size or number of elements in a dataframe\n",
    "\n",
    "df4.size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-twlx7EFLI68",
    "outputId": "9d611a45-c218-4f73-c7ba-4cf36b0de15e"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3, 3)"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the shape\n",
    "\n",
    "df4.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "AMIZ2GJCLU3h",
    "outputId": "59126c5d-5252-4508-b35d-1e0af8e8ca24"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the length of the dataframe/the number of rows in a dataframe\n",
    "\n",
    "df4.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the length of the dataframe/the number of columns in a dataframe\n",
    "\n",
    "df4.shape[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "NPsgE9ogLchY"
   },
   "source": [
    "#### Unique Values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "gpVCVrAELeGz",
    "outputId": "a16a2c3c-4197-4c63-f6be-85a770891ac7"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Shirt', 'Boot', 'Bag'], dtype=object)"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return unique values in a given column \n",
    "\n",
    "df4['Product Name'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "vc6k80jULlJ2",
    "outputId": "304b4b99-23b0-44f6-d48f-b8c15c33ee66"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return a number of unique values\n",
    "df4['Product Name'].nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4E5tnSieL3q1",
    "outputId": "3245bb8c-473b-4eee-8b1c-1c520362b09b"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Shirt    1\n",
       "Boot     1\n",
       "Bag      1\n",
       "Name: Product Name, dtype: int64"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Counting the occurence of each value in a column \n",
    "\n",
    "df4['Product Name'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iKHhUk07Qsfr"
   },
   "source": [
    "#### Applying a Function to Dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "id": "JW0yon8ZQuRk"
   },
   "outputs": [],
   "source": [
    "# Double the quantity product\n",
    "\n",
    "def double_quantity(x):\n",
    "  return x * x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "AYAD97GwRrD4",
    "outputId": "70ccdfdd-a574-47a6-9022-1a437c5ebc68"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    100\n",
       "1     25\n",
       "2     81\n",
       "Name: Total Quantity, dtype: int64"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4['Total Quantity'].apply(double_quantity)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 111
    },
    "id": "arYX52diU25C",
    "outputId": "a167184b-2eed-4d4c-b6bd-00ecfc505250"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>16</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2\n",
       "0     1     4\n",
       "1    16    25"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You can also apply an anonymous function to a dataframe\n",
    "# Squaring each value in dataframe\n",
    "\n",
    "df5 = pd.DataFrame([[1,2], [4,5]], columns=['col1', 'col2'])\n",
    "\n",
    "df5.applymap(lambda x: x**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KQgkBB6jW0mb"
   },
   "source": [
    "#### Sorting values in dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "oVQUpEWpW3eh",
    "outputId": "08573bf2-357c-4ecb-99e9-8e55423ed85e"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Shirt</td>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boot</td>\n",
       "      <td>56</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Bag</td>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Product Name  Order Number  Total Quantity\n",
       "0        Shirt            45              10\n",
       "1         Boot            56               5\n",
       "2          Bag            64               9"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sort the df4 by the order number\n",
    "\n",
    "df4.sort_values(['Order Number'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "SXHVgJOIXrYM",
    "outputId": "16c2c64a-8a3a-41b0-f21d-f94ba83e5d5d"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Bag</td>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boot</td>\n",
       "      <td>56</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Shirt</td>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Product Name  Order Number  Total Quantity\n",
       "2          Bag            64               9\n",
       "1         Boot            56               5\n",
       "0        Shirt            45              10"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.sort_values(['Order Number'], ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Nf96M45vX6WI"
   },
   "source": [
    "<a name='1-5'></a>\n",
    "### E. Aggregation Methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "YoqCRTKNYPsX",
    "outputId": "59bc6d83-8bac-4144-9dfa-02c5230fc09f"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Shirt</td>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boot</td>\n",
       "      <td>56</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Bag</td>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Product Name  Order Number  Total Quantity\n",
       "0        Shirt            45              10\n",
       "1         Boot            56               5\n",
       "2          Bag            64               9"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 297
    },
    "id": "sGMYqGLDOW7y",
    "outputId": "39f39243-3153-4630-be08-6eb0614129e0"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>55.000000</td>\n",
       "      <td>8.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>9.539392</td>\n",
       "      <td>2.645751</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>45.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>50.500000</td>\n",
       "      <td>7.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>56.000000</td>\n",
       "      <td>9.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>60.000000</td>\n",
       "      <td>9.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>64.000000</td>\n",
       "      <td>10.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Order Number  Total Quantity\n",
       "count      3.000000        3.000000\n",
       "mean      55.000000        8.000000\n",
       "std        9.539392        2.645751\n",
       "min       45.000000        5.000000\n",
       "25%       50.500000        7.000000\n",
       "50%       56.000000        9.000000\n",
       "75%       60.000000        9.500000\n",
       "max       64.000000       10.000000"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# summary statistics\n",
    "\n",
    "df4.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 111
    },
    "id": "1fhz845VOdb_",
    "outputId": "d96f93de-619a-426c-ffc6-4865bbc928c3"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Order Number</th>\n",
       "      <td>3.0</td>\n",
       "      <td>55.0</td>\n",
       "      <td>9.539392</td>\n",
       "      <td>45.0</td>\n",
       "      <td>50.5</td>\n",
       "      <td>56.0</td>\n",
       "      <td>60.0</td>\n",
       "      <td>64.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Total Quantity</th>\n",
       "      <td>3.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>2.645751</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>9.5</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                count  mean       std   min   25%   50%   75%   max\n",
       "Order Number      3.0  55.0  9.539392  45.0  50.5  56.0  60.0  64.0\n",
       "Total Quantity    3.0   8.0  2.645751   5.0   7.0   9.0   9.5  10.0"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "df4.describe().transpose()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "e9RKm4TIHoui",
    "outputId": "23e93644-03f3-457f-e0bc-cea6b1f117bd"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5\n",
       "1     9\n",
       "2    10\n",
       "dtype: int64"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Mode of the dataframe\n",
    "# Mode is the most recurring values\n",
    "\n",
    "df4['Total Quantity'].mode()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "IzRRvXcOIbzC",
    "outputId": "938e4d3a-caa0-413e-d17a-a9551ff91afd"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The maximum value\n",
    "\n",
    "df4['Total Quantity'].max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "T6x7YvCxJCFg",
    "outputId": "59f4672f-180a-43fb-8185-417550064ae7"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The minimum value\n",
    "\n",
    "df4['Total Quantity'].min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Jq8PKQIcJIpO",
    "outputId": "f6804d4e-22ba-462a-dbcc-b415bf510b6c"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8.0"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The mean\n",
    "\n",
    "df4['Total Quantity'].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "kxl8cn4jJYDg",
    "outputId": "943e05b4-da1a-4322-e594-9e3e16fbc1e5"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9.0"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The median value in a dataframe\n",
    "\n",
    "df4['Total Quantity'].median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "7sAHCt3rJg0P",
    "outputId": "542668fc-cdee-45a8-d062-27744fc04fb0"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.6457513110645907"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Standard deviation\n",
    "\n",
    "df4['Total Quantity'].std()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "VryxdVidJy3q",
    "outputId": "7cecf363-c5fe-4994-961e-6696e7cf62aa"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.0"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Variance \n",
    "\n",
    "df4['Total Quantity'].var()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "1BuyPI-zJ69A",
    "outputId": "26dc9cf7-aa05-44ff-94ea-38153a7e1ef2"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "24"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sum of all values in a column\n",
    "\n",
    "df4['Total Quantity'].sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "EhHTxpRrKD7I",
    "outputId": "5d80593c-5d5f-499b-8122-a36c281439ff"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "450"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Product of all values in dataframe\n",
    "\n",
    "df4['Total Quantity'].prod()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "viZf9yCpK4xO"
   },
   "source": [
    "<a name='1-6'></a>\n",
    "### F. Groupby\n",
    "\n",
    "`Group by` involves splitting data into groups, applying function to each group, and combining the results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "id": "AmKHc2IjMZdE"
   },
   "outputs": [],
   "source": [
    "df4 = pd.DataFrame({'Product Name':['Shirt','Boot','Bag', 'Ankle', 'Pullover', 'Boot', 'Ankle', 'Tshirt', 'Shirt'], \n",
    "              'Order Number':[45,56,64, 34, 67, 56, 34, 89, 45], \n",
    "              'Total Quantity':[10,5,9, 11, 11, 8, 14, 23, 10]}, \n",
    "              columns = ['Product Name', 'Order Number', 'Total Quantity'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 328
    },
    "id": "lIhEZ-cZMzUh",
    "outputId": "0ec73ad8-626b-490f-ff37-cdbf99c2db83"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Shirt</td>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boot</td>\n",
       "      <td>56</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Bag</td>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Ankle</td>\n",
       "      <td>34</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Pullover</td>\n",
       "      <td>67</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Boot</td>\n",
       "      <td>56</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Ankle</td>\n",
       "      <td>34</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Tshirt</td>\n",
       "      <td>89</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Shirt</td>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Product Name  Order Number  Total Quantity\n",
       "0        Shirt            45              10\n",
       "1         Boot            56               5\n",
       "2          Bag            64               9\n",
       "3        Ankle            34              11\n",
       "4     Pullover            67              11\n",
       "5         Boot            56               8\n",
       "6        Ankle            34              14\n",
       "7       Tshirt            89              23\n",
       "8        Shirt            45              10"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 266
    },
    "id": "jNLj7ehAM02_",
    "outputId": "06525190-572e-47f2-ac9c-4eb31a33c885"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ankle</th>\n",
       "      <td>34.0</td>\n",
       "      <td>12.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bag</th>\n",
       "      <td>64.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boot</th>\n",
       "      <td>56.0</td>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pullover</th>\n",
       "      <td>67.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shirt</th>\n",
       "      <td>45.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tshirt</th>\n",
       "      <td>89.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Order Number  Total Quantity\n",
       "Product Name                              \n",
       "Ankle                 34.0            12.5\n",
       "Bag                   64.0             9.0\n",
       "Boot                  56.0             6.5\n",
       "Pullover              67.0            11.0\n",
       "Shirt                 45.0            10.0\n",
       "Tshirt                89.0            23.0"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Let's group the df by product name\n",
    "\n",
    "df4.groupby('Product Name').mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 266
    },
    "id": "gHaQZBIXNR-s",
    "outputId": "aaae2325-9114-4e52-fb15-ac34ba4ec847"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ankle</th>\n",
       "      <td>68</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bag</th>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boot</th>\n",
       "      <td>112</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pullover</th>\n",
       "      <td>67</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shirt</th>\n",
       "      <td>90</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tshirt</th>\n",
       "      <td>89</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Order Number  Total Quantity\n",
       "Product Name                              \n",
       "Ankle                   68              25\n",
       "Bag                     64               9\n",
       "Boot                   112              13\n",
       "Pullover                67              11\n",
       "Shirt                   90              20\n",
       "Tshirt                  89              23"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.groupby('Product Name').sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 266
    },
    "id": "tgJA87lTN3dJ",
    "outputId": "13bc0ceb-09be-45df-b066-c95ec81a760a"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ankle</th>\n",
       "      <td>34</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bag</th>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boot</th>\n",
       "      <td>56</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pullover</th>\n",
       "      <td>67</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shirt</th>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tshirt</th>\n",
       "      <td>89</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Order Number  Total Quantity\n",
       "Product Name                              \n",
       "Ankle                   34              11\n",
       "Bag                     64               9\n",
       "Boot                    56               5\n",
       "Pullover                67              11\n",
       "Shirt                   45              10\n",
       "Tshirt                  89              23"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.groupby('Product Name').min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 266
    },
    "id": "1hPLubKZN83R",
    "outputId": "b0550960-111c-4e6c-acf1-a84f747f0035"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Order Number</th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ankle</th>\n",
       "      <td>34</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bag</th>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boot</th>\n",
       "      <td>56</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pullover</th>\n",
       "      <td>67</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shirt</th>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tshirt</th>\n",
       "      <td>89</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Order Number  Total Quantity\n",
       "Product Name                              \n",
       "Ankle                   34              14\n",
       "Bag                     64               9\n",
       "Boot                    56               8\n",
       "Pullover                67              11\n",
       "Shirt                   45              10\n",
       "Tshirt                  89              23"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.groupby('Product Name').max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 266
    },
    "id": "lUvqB_Y-P-K8",
    "outputId": "08dd63ea-7002-4cfa-daad-3c6e1725fe07"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ankle</th>\n",
       "      <th>34</th>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bag</th>\n",
       "      <th>64</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boot</th>\n",
       "      <th>56</th>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pullover</th>\n",
       "      <th>67</th>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shirt</th>\n",
       "      <th>45</th>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tshirt</th>\n",
       "      <th>89</th>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           Total Quantity\n",
       "Product Name Order Number                \n",
       "Ankle        34                        14\n",
       "Bag          64                         9\n",
       "Boot         56                         8\n",
       "Pullover     67                        11\n",
       "Shirt        45                        10\n",
       "Tshirt       89                        23"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.groupby(['Product Name', 'Order Number']).max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 266
    },
    "id": "kk8w4IMCQKvu",
    "outputId": "52c5e63c-566e-4dae-b0c0-d5c8515fbfae"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Total Quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product Name</th>\n",
       "      <th>Order Number</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ankle</th>\n",
       "      <th>34</th>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bag</th>\n",
       "      <th>64</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boot</th>\n",
       "      <th>56</th>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pullover</th>\n",
       "      <th>67</th>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shirt</th>\n",
       "      <th>45</th>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tshirt</th>\n",
       "      <th>89</th>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           Total Quantity\n",
       "Product Name Order Number                \n",
       "Ankle        34                        25\n",
       "Bag          64                         9\n",
       "Boot         56                        13\n",
       "Pullover     67                        11\n",
       "Shirt        45                        20\n",
       "Tshirt       89                        23"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.groupby(['Product Name', 'Order Number']).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zvqtUO0lO2HS"
   },
   "source": [
    "You can also use `aggregation()` after groupby. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 297
    },
    "id": "i6JlX-yrO2iP",
    "outputId": "78690759-45b2-468e-f148-bdebb08014fd"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">Order Number</th>\n",
       "      <th colspan=\"3\" halign=\"left\">Total Quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>sum</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>sum</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ankle</th>\n",
       "      <td>34</td>\n",
       "      <td>34</td>\n",
       "      <td>68</td>\n",
       "      <td>11</td>\n",
       "      <td>14</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bag</th>\n",
       "      <td>64</td>\n",
       "      <td>64</td>\n",
       "      <td>64</td>\n",
       "      <td>9</td>\n",
       "      <td>9</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boot</th>\n",
       "      <td>56</td>\n",
       "      <td>56</td>\n",
       "      <td>112</td>\n",
       "      <td>5</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pullover</th>\n",
       "      <td>67</td>\n",
       "      <td>67</td>\n",
       "      <td>67</td>\n",
       "      <td>11</td>\n",
       "      <td>11</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Shirt</th>\n",
       "      <td>45</td>\n",
       "      <td>45</td>\n",
       "      <td>90</td>\n",
       "      <td>10</td>\n",
       "      <td>10</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tshirt</th>\n",
       "      <td>89</td>\n",
       "      <td>89</td>\n",
       "      <td>89</td>\n",
       "      <td>23</td>\n",
       "      <td>23</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Order Number          Total Quantity        \n",
       "                      min max  sum            min max sum\n",
       "Product Name                                             \n",
       "Ankle                  34  34   68             11  14  25\n",
       "Bag                    64  64   64              9   9   9\n",
       "Boot                   56  56  112              5   8  13\n",
       "Pullover               67  67   67             11  11  11\n",
       "Shirt                  45  45   90             10  10  20\n",
       "Tshirt                 89  89   89             23  23  23"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.groupby('Product Name').aggregate(['min', 'max', 'sum'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iaDkcuplQfD7"
   },
   "source": [
    "<a name='1-7'></a>\n",
    "### G. Combining Datasets: Concatenating, Joining and Merging\n",
    "\n",
    "#### Concatenation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {
    "id": "7Sas3WNFQgf6"
   },
   "outputs": [],
   "source": [
    "# Creating dataframes\n",
    "\n",
    "df1 = pd.DataFrame({'Col1':['A','B','C'],\n",
    "                   'Col2':[1,2,3]}, \n",
    "                   index=['a','b','c'])\n",
    "\n",
    "df2 = pd.DataFrame({'Col1':['D','E','F'],\n",
    "                   'Col2':[4,5,6]}, \n",
    "                   index=['d','e','f'])\n",
    "\n",
    "\n",
    "df3 = pd.DataFrame({'Col1':['G','I','J'],\n",
    "                   'Col2':[7,8,9]}, \n",
    "                   index=['g', 'i','j'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "DWCL_KbiV00-",
    "outputId": "c16c316a-3a6c-48c3-98dc-6538bc717ca2"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>B</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2\n",
       "a    A     1\n",
       "b    B     2\n",
       "c    C     3"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "mCmFFGDcXBlx",
    "outputId": "a63840b8-26a5-4c04-9088-87824453b90e"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>D</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>E</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>F</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2\n",
       "d    D     4\n",
       "e    E     5\n",
       "f    F     6"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "HftwMeb1XEI6",
    "outputId": "d8ac4302-d7df-49c6-c251-609a1525cc76"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>G</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>i</th>\n",
       "      <td>I</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>j</th>\n",
       "      <td>J</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2\n",
       "g    G     7\n",
       "i    I     8\n",
       "j    J     9"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 328
    },
    "id": "GJ4UHZakXFiM",
    "outputId": "d3af8f06-e924-48db-98db-a96cbdc6e7ca"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>B</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>D</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>E</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>F</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>G</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>i</th>\n",
       "      <td>I</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>j</th>\n",
       "      <td>J</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2\n",
       "a    A     1\n",
       "b    B     2\n",
       "c    C     3\n",
       "d    D     4\n",
       "e    E     5\n",
       "f    F     6\n",
       "g    G     7\n",
       "i    I     8\n",
       "j    J     9"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Concatenating: Adding one dataset to another\n",
    "\n",
    "pd.concat([df1, df2, df3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "A8NR3BpsM_Et"
   },
   "source": [
    "The default axis is `0`. This is how the combined dataframes will look like if we change the `axis to 1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 328
    },
    "id": "n2QCVHToNYIT",
    "outputId": "96e1fefb-2e32-4838-e8e4-48858d8bd93c"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>B</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>D</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>E</td>\n",
       "      <td>5.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>F</td>\n",
       "      <td>6.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>G</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>i</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>I</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>j</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>J</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2 Col1  Col2 Col1  Col2\n",
       "a    A   1.0  NaN   NaN  NaN   NaN\n",
       "b    B   2.0  NaN   NaN  NaN   NaN\n",
       "c    C   3.0  NaN   NaN  NaN   NaN\n",
       "d  NaN   NaN    D   4.0  NaN   NaN\n",
       "e  NaN   NaN    E   5.0  NaN   NaN\n",
       "f  NaN   NaN    F   6.0  NaN   NaN\n",
       "g  NaN   NaN  NaN   NaN    G   7.0\n",
       "i  NaN   NaN  NaN   NaN    I   8.0\n",
       "j  NaN   NaN  NaN   NaN    J   9.0"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df2, df3], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 328
    },
    "id": "VL4FtD33Xdln",
    "outputId": "1bacc555-8d28-45a0-ddcc-56f2221a4b66"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>B</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>D</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>E</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>F</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>G</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>i</th>\n",
       "      <td>I</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>j</th>\n",
       "      <td>J</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2\n",
       "a    A     1\n",
       "b    B     2\n",
       "c    C     3\n",
       "d    D     4\n",
       "e    E     5\n",
       "f    F     6\n",
       "g    G     7\n",
       "i    I     8\n",
       "j    J     9"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We can also use append()\n",
    "\n",
    "df1.append([df2, df3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "G5RJbmpqN5C1"
   },
   "source": [
    "#### Merging\n",
    "\n",
    "If you have worked with SQL, what `pd.merge()` does may be familiar. It links data from different sources (different features) and you have a control on the structure of the combined dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DvlwrPIsaLho"
   },
   "source": [
    "*Pandas Merge method(`how`): SQL Join Name : Description*\n",
    "\n",
    "\n",
    "\n",
    "```\n",
    "* left : LEFT OUTER JOIN : Use keys or columns from left frame only\n",
    "\n",
    "* right : RIGHT OUTER JOIN : Use keys or columns from right frame only\n",
    "\n",
    "* outer : FULL OUTER JOIN : Use union of keys or columns from both frames\n",
    "\n",
    "* inner : INNER JOIN : Use intersection of keys or columns from both frames\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "id": "xJvZbm8ZN1SB"
   },
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame({'Name': ['Joe', 'Joshua', 'Jeanne', 'David'],\n",
    "                        'Role': ['Manager', 'Developer', 'Engineer', 'Scientist']})\n",
    "\n",
    "df2 = pd.DataFrame({'Name': ['David', 'Joshua', 'Joe', 'Jeanne'],\n",
    "'Year Hired': [2018, 2017, 2020, 2018]})\n",
    "\n",
    "df3 = pd.DataFrame({'Name': ['David', 'Joshua', 'Joe', 'Jeanne'],\n",
    "'No of Leaves': [15, 3, 10, 12]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "fRK8jNJ2SqxM",
    "outputId": "a3782332-f882-4496-892a-6fb175af4d52"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Role</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Joe</td>\n",
       "      <td>Manager</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Joshua</td>\n",
       "      <td>Developer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Jeanne</td>\n",
       "      <td>Engineer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>Scientist</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name       Role\n",
       "0     Joe    Manager\n",
       "1  Joshua  Developer\n",
       "2  Jeanne   Engineer\n",
       "3   David  Scientist"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "LlzW52vJStN7",
    "outputId": "cbc480a2-0ede-4b5e-e55f-bac0a5b1f827"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Year Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>David</td>\n",
       "      <td>2018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Joshua</td>\n",
       "      <td>2017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Joe</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Jeanne</td>\n",
       "      <td>2018</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Year Hired\n",
       "0   David        2018\n",
       "1  Joshua        2017\n",
       "2     Joe        2020\n",
       "3  Jeanne        2018"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "JroYckVMSv1i",
    "outputId": "d4cdb9a9-9054-4249-b809-ccd8ff8d5f07"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Joe</td>\n",
       "      <td>Manager</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Joshua</td>\n",
       "      <td>Developer</td>\n",
       "      <td>2017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Jeanne</td>\n",
       "      <td>Engineer</td>\n",
       "      <td>2018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>Scientist</td>\n",
       "      <td>2018</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name       Role  Year Hired\n",
       "0     Joe    Manager        2020\n",
       "1  Joshua  Developer        2017\n",
       "2  Jeanne   Engineer        2018\n",
       "3   David  Scientist        2018"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "Qso0a2AMS-qg",
    "outputId": "56bcee2b-4548-45fa-f04e-5f7a588dd8f4"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Joe</td>\n",
       "      <td>Manager</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Joshua</td>\n",
       "      <td>Developer</td>\n",
       "      <td>2017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Jeanne</td>\n",
       "      <td>Engineer</td>\n",
       "      <td>2018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>Scientist</td>\n",
       "      <td>2018</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name       Role  Year Hired\n",
       "0     Joe    Manager        2020\n",
       "1  Joshua  Developer        2017\n",
       "2  Jeanne   Engineer        2018\n",
       "3   David  Scientist        2018"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Let's merge on Role being a key \n",
    "\n",
    "pd.merge(df1, df2, how='inner', on=\"Name\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "id": "CvSLfcCZWrN8"
   },
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame({'col1': ['K0', 'K0', 'K1', 'K2'],\n",
    "                     'col2': ['K0', 'K1', 'K0', 'K1'],\n",
    "                        'A': ['A0', 'A1', 'A2', 'A3'],\n",
    "                        'B': ['B0', 'B1', 'B2', 'B3']})\n",
    "    \n",
    "df2 = pd.DataFrame({'col1': ['K0', 'K1', 'K1', 'K2'],\n",
    "                               'col2': ['K0', 'K0', 'K0', 'K0'],\n",
    "                                  'C': ['C0', 'C1', 'C2', 'C3'],\n",
    "                                  'D': ['D0', 'D1', 'D2', 'D3']})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "fNuBZxJ3Y2tR",
    "outputId": "04fead01-62f2-4bf6-9610-03aae17f130b"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K2</td>\n",
       "      <td>K1</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1 col2   A   B\n",
       "0   K0   K0  A0  B0\n",
       "1   K0   K1  A1  B1\n",
       "2   K1   K0  A2  B2\n",
       "3   K2   K1  A3  B3"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "BnH1xC4WY31H",
    "outputId": "899584c7-05ed-4504-9b08-6e6858c5f822"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K2</td>\n",
       "      <td>K0</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1 col2   C   D\n",
       "0   K0   K0  C0  D0\n",
       "1   K1   K0  C1  D1\n",
       "2   K1   K0  C2  D2\n",
       "3   K2   K0  C3  D3"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "bQ_chqv-VpxK",
    "outputId": "79b127fd-90ea-43b7-ccc9-1d601ad4e439"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1 col2   A   B   C   D\n",
       "0   K0   K0  A0  B0  C0  D0\n",
       "1   K1   K0  A2  B2  C1  D1\n",
       "2   K1   K0  A2  B2  C2  D2"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how='inner', on=['col1', 'col2'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 235
    },
    "id": "AA0B0vnOZBe-",
    "outputId": "804bc5a4-135b-418c-e214-adcca9c76df5"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>K2</td>\n",
       "      <td>K1</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>K2</td>\n",
       "      <td>K0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1 col2    A    B    C    D\n",
       "0   K0   K0   A0   B0   C0   D0\n",
       "1   K0   K1   A1   B1  NaN  NaN\n",
       "2   K1   K0   A2   B2   C1   D1\n",
       "3   K1   K0   A2   B2   C2   D2\n",
       "4   K2   K1   A3   B3  NaN  NaN\n",
       "5   K2   K0  NaN  NaN   C3   D3"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how='outer', on=['col1', 'col2'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "id": "jqLLLuMwZYSS",
    "outputId": "ad49df22-ff36-47dc-d9f4-224da1fbcd61"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K0</td>\n",
       "      <td>K1</td>\n",
       "      <td>A1</td>\n",
       "      <td>B1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>K2</td>\n",
       "      <td>K1</td>\n",
       "      <td>A3</td>\n",
       "      <td>B3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1 col2   A   B    C    D\n",
       "0   K0   K0  A0  B0   C0   D0\n",
       "1   K0   K1  A1  B1  NaN  NaN\n",
       "2   K1   K0  A2  B2   C1   D1\n",
       "3   K1   K0  A2  B2   C2   D2\n",
       "4   K2   K1  A3  B3  NaN  NaN"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how='left')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "92MAb163Z0IM",
    "outputId": "bf72f4bb-b61f-4149-d42a-ec542b29ca0e"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K0</td>\n",
       "      <td>K0</td>\n",
       "      <td>A0</td>\n",
       "      <td>B0</td>\n",
       "      <td>C0</td>\n",
       "      <td>D0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C1</td>\n",
       "      <td>D1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>K1</td>\n",
       "      <td>K0</td>\n",
       "      <td>A2</td>\n",
       "      <td>B2</td>\n",
       "      <td>C2</td>\n",
       "      <td>D2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>K2</td>\n",
       "      <td>K0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C3</td>\n",
       "      <td>D3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1 col2    A    B   C   D\n",
       "0   K0   K0   A0   B0  C0  D0\n",
       "1   K1   K0   A2   B2  C1  D1\n",
       "2   K1   K0   A2   B2  C2  D2\n",
       "3   K2   K0  NaN  NaN  C3  D3"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how='right')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8eBHN57CcYhI"
   },
   "source": [
    "#### Joining\n",
    "\n",
    "Joining is a simple way to combine columns of two dataframes with different indexes. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {
    "id": "cc9WCJHzc5Lt"
   },
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame({'Col1': ['A', 'B', 'C'],\n",
    "                     'Col2': [11, 12, 13]},\n",
    "                      index=['a', 'b', 'c']) \n",
    "\n",
    "df2 = pd.DataFrame({'Col3': ['D', 'E', 'F'],\n",
    "                    'Col4': [14, 14, 16]},\n",
    "                      index=['a', 'c', 'd'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "-2vPNNXcd532",
    "outputId": "314fc545-bcba-418d-ecf9-e236df1de66a"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>B</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2\n",
       "a    A    11\n",
       "b    B    12\n",
       "c    C    13"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "fPRLEeLod7gW",
    "outputId": "ab506d97-75c6-4c15-a740-5148eec89acc"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col3</th>\n",
       "      <th>Col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>D</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>E</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>F</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col3  Col4\n",
       "a    D    14\n",
       "c    E    14\n",
       "d    F    16"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "9vjxh0O-d9e2",
    "outputId": "02cb3f80-887d-447a-b2cb-b21e0402f85e"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "      <th>Col3</th>\n",
       "      <th>Col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>11</td>\n",
       "      <td>D</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>B</td>\n",
       "      <td>12</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>13</td>\n",
       "      <td>E</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2 Col3  Col4\n",
       "a    A    11    D  14.0\n",
       "b    B    12  NaN   NaN\n",
       "c    C    13    E  14.0"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.join(df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "RwuSaXD5eBPN",
    "outputId": "0ced8f45-3247-4b21-bac5-e1909b3a9b60"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col3</th>\n",
       "      <th>Col4</th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>D</td>\n",
       "      <td>14</td>\n",
       "      <td>A</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>E</td>\n",
       "      <td>14</td>\n",
       "      <td>C</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>F</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col3  Col4 Col1  Col2\n",
       "a    D    14    A  11.0\n",
       "c    E    14    C  13.0\n",
       "d    F    16  NaN   NaN"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.join(df1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5ApI4u3DeSk1"
   },
   "source": [
    "You can see that with `df.join()`, the alignment of data is on indexes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 111
    },
    "id": "_i-_dSy5efHn",
    "outputId": "f1329d33-4263-493b-f416-1dca2e60206a"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "      <th>Col3</th>\n",
       "      <th>Col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>11</td>\n",
       "      <td>D</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>13</td>\n",
       "      <td>E</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2 Col3  Col4\n",
       "a    A    11    D    14\n",
       "c    C    13    E    14"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.join(df2, how='inner')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "id": "88PjCW2ofM-y",
    "outputId": "e98270bc-8965-41fc-accb-5d32991d39c2"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Col1</th>\n",
       "      <th>Col2</th>\n",
       "      <th>Col3</th>\n",
       "      <th>Col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>A</td>\n",
       "      <td>11.0</td>\n",
       "      <td>D</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>B</td>\n",
       "      <td>12.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>C</td>\n",
       "      <td>13.0</td>\n",
       "      <td>E</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>F</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Col1  Col2 Col3  Col4\n",
       "a    A  11.0    D  14.0\n",
       "b    B  12.0  NaN   NaN\n",
       "c    C  13.0    E  14.0\n",
       "d  NaN   NaN    F  16.0"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.join(df2, how='outer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rLBMe6CINpzu"
   },
   "source": [
    "Learn more about Merging, Joining, and Concatenating the Pandas Dataframes [here](https://pandas.pydata.org/docs/user_guide/merging.html). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "I86Tf0VYfTJT"
   },
   "source": [
    "<a name='1-8'></a>\n",
    "### H. Beyond Dataframes: Working with CSV and Excel\n",
    "\n",
    "In this last section of Pandas' fundamentals, we will see how to read real world data with different formats: CSV and Excel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BQD9appAk64O"
   },
   "source": [
    "#### CSV and Excel\n",
    "\n",
    "Let's use california housing dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {
    "id": "K7rfiyx6ivZ_"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100 1390k  100 1390k    0     0   409k      0  0:00:03  0:00:03 --:--:--  409k\n"
     ]
    }
   ],
   "source": [
    "# Let's download the data \n",
    "\n",
    "!curl -O https://raw.githubusercontent.com/nyandwi/public_datasets/master/housing.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv('housing.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "id": "bbASBwqilYCZ",
    "outputId": "6d0a180e-2887-4346-a55e-8ebf02d0a793"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>longitude</th>\n",
       "      <th>latitude</th>\n",
       "      <th>housing_median_age</th>\n",
       "      <th>total_rooms</th>\n",
       "      <th>total_bedrooms</th>\n",
       "      <th>population</th>\n",
       "      <th>households</th>\n",
       "      <th>median_income</th>\n",
       "      <th>median_house_value</th>\n",
       "      <th>ocean_proximity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-122.23</td>\n",
       "      <td>37.88</td>\n",
       "      <td>41.0</td>\n",
       "      <td>880.0</td>\n",
       "      <td>129.0</td>\n",
       "      <td>322.0</td>\n",
       "      <td>126.0</td>\n",
       "      <td>8.3252</td>\n",
       "      <td>452600.0</td>\n",
       "      <td>NEAR BAY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-122.22</td>\n",
       "      <td>37.86</td>\n",
       "      <td>21.0</td>\n",
       "      <td>7099.0</td>\n",
       "      <td>1106.0</td>\n",
       "      <td>2401.0</td>\n",
       "      <td>1138.0</td>\n",
       "      <td>8.3014</td>\n",
       "      <td>358500.0</td>\n",
       "      <td>NEAR BAY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-122.24</td>\n",
       "      <td>37.85</td>\n",
       "      <td>52.0</td>\n",
       "      <td>1467.0</td>\n",
       "      <td>190.0</td>\n",
       "      <td>496.0</td>\n",
       "      <td>177.0</td>\n",
       "      <td>7.2574</td>\n",
       "      <td>352100.0</td>\n",
       "      <td>NEAR BAY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-122.25</td>\n",
       "      <td>37.85</td>\n",
       "      <td>52.0</td>\n",
       "      <td>1274.0</td>\n",
       "      <td>235.0</td>\n",
       "      <td>558.0</td>\n",
       "      <td>219.0</td>\n",
       "      <td>5.6431</td>\n",
       "      <td>341300.0</td>\n",
       "      <td>NEAR BAY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-122.25</td>\n",
       "      <td>37.85</td>\n",
       "      <td>52.0</td>\n",
       "      <td>1627.0</td>\n",
       "      <td>280.0</td>\n",
       "      <td>565.0</td>\n",
       "      <td>259.0</td>\n",
       "      <td>3.8462</td>\n",
       "      <td>342200.0</td>\n",
       "      <td>NEAR BAY</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   longitude  latitude  housing_median_age  total_rooms  total_bedrooms  \\\n",
       "0    -122.23     37.88                41.0        880.0           129.0   \n",
       "1    -122.22     37.86                21.0       7099.0          1106.0   \n",
       "2    -122.24     37.85                52.0       1467.0           190.0   \n",
       "3    -122.25     37.85                52.0       1274.0           235.0   \n",
       "4    -122.25     37.85                52.0       1627.0           280.0   \n",
       "\n",
       "   population  households  median_income  median_house_value ocean_proximity  \n",
       "0       322.0       126.0         8.3252            452600.0        NEAR BAY  \n",
       "1      2401.0      1138.0         8.3014            358500.0        NEAR BAY  \n",
       "2       496.0       177.0         7.2574            352100.0        NEAR BAY  \n",
       "3       558.0       219.0         5.6431            341300.0        NEAR BAY  \n",
       "4       565.0       259.0         3.8462            342200.0        NEAR BAY  "
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "VetH2qu5mnFM",
    "outputId": "0321f018-e42a-4976-82db-5889461d1f3f"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {
    "id": "Ca5qySmRmsz0"
   },
   "outputs": [],
   "source": [
    "## Exporting dataframe back to csv\n",
    "\n",
    "data.to_csv('housing_dataset', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LmqPzh9-nWgQ"
   },
   "source": [
    "If you look into the folder sidebar, you can see `Housing Dataset`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {
    "id": "vseWxDRYm1iA"
   },
   "outputs": [
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'openpyxl'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m/var/folders/9x/fscj3yx566q3y3y1kf5yh9m40000gn/T/ipykernel_1348/1131967869.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m## Exporting CSV to Excel\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_excel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'housing_excel.xlsx'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/miniforge3/envs/TensorPro/lib/python3.9/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mto_excel\u001b[0;34m(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes, storage_options)\u001b[0m\n\u001b[1;32m   2282\u001b[0m             \u001b[0minf_rep\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minf_rep\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2283\u001b[0m         )\n\u001b[0;32m-> 2284\u001b[0;31m         formatter.write(\n\u001b[0m\u001b[1;32m   2285\u001b[0m             \u001b[0mexcel_writer\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2286\u001b[0m             \u001b[0msheet_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msheet_name\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/miniforge3/envs/TensorPro/lib/python3.9/site-packages/pandas/io/formats/excel.py\u001b[0m in \u001b[0;36mwrite\u001b[0;34m(self, writer, sheet_name, startrow, startcol, freeze_panes, engine, storage_options)\u001b[0m\n\u001b[1;32m    832\u001b[0m             \u001b[0;31m# error: Cannot instantiate abstract class 'ExcelWriter' with abstract\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    833\u001b[0m             \u001b[0;31m# attributes 'engine', 'save', 'supported_extensions' and 'write_cells'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 834\u001b[0;31m             writer = ExcelWriter(  # type: ignore[abstract]\n\u001b[0m\u001b[1;32m    835\u001b[0m                 \u001b[0mwriter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstorage_options\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstorage_options\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    836\u001b[0m             )\n",
      "\u001b[0;32m~/miniforge3/envs/TensorPro/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, path, engine, date_format, datetime_format, mode, storage_options, if_sheet_exists, engine_kwargs, **kwargs)\u001b[0m\n\u001b[1;32m     46\u001b[0m     ):\n\u001b[1;32m     47\u001b[0m         \u001b[0;31m# Use the openpyxl module as the Excel writer.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 48\u001b[0;31m         \u001b[0;32mfrom\u001b[0m \u001b[0mopenpyxl\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mworkbook\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mWorkbook\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     49\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     50\u001b[0m         \u001b[0mengine_kwargs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcombine_kwargs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mengine_kwargs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'openpyxl'"
     ]
    }
   ],
   "source": [
    "## Exporting CSV to Excel\n",
    "\n",
    "data.to_excel('housing_excel.xlsx', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "0DinnjCfnt3M"
   },
   "outputs": [],
   "source": [
    "## Reading the Excel file back\n",
    "\n",
    "excel_data = pd.read_excel('housing_excel.xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "id": "mIKA-KwYiFgZ",
    "outputId": "cdbf9eed-a457-400c-a7d8-5620bc275001"
   },
   "outputs": [],
   "source": [
    "\n",
    "excel_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HK8qIyAEjRsM"
   },
   "source": [
    "<a name='2'></a>\n",
    "## Real World: Exploratory Data Analysis (EDA)\n",
    "\n",
    "All above was the basics. Let us apply some of these techniques to the real world dataset, `Red wine quality`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "YEWQ8U-FjTVo",
    "outputId": "cb81a5a4-43d4-4821-e29a-84e243e40aff"
   },
   "outputs": [],
   "source": [
    "!curl -O https://raw.githubusercontent.com/nyandwi/public_datasets/master/winequality-red.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "EBXDmKTlmy0X"
   },
   "outputs": [],
   "source": [
    "wine_data = pd.read_csv('winequality-red.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 221
    },
    "id": "Gr93h3Xgm8vF",
    "outputId": "2510e3ee-7ff4-46fc-e19a-5bf67b50edd8"
   },
   "outputs": [],
   "source": [
    "# Displaying the head of the dataset\n",
    "\n",
    "wine_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 221
    },
    "id": "BrBYEf9Dqg9K",
    "outputId": "b240cb52-ba5b-4286-a42a-8ea7feafe135"
   },
   "outputs": [],
   "source": [
    "# Displaying the tail of the dataset\n",
    "\n",
    "wine_data.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 421
    },
    "id": "qauq0ujDnocm",
    "outputId": "23d06fc6-738b-4ad8-8582-3fd6e916cf18"
   },
   "outputs": [],
   "source": [
    "# Displaying summary statistics\n",
    "\n",
    "wine_data.describe().transpose()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "qPCZ3X8WqN4j",
    "outputId": "85edec22-5d4e-4f79-c0aa-af7ac9f0700e"
   },
   "outputs": [],
   "source": [
    "# Displaying quick information about the dataset \n",
    "\n",
    "wine_data.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "LWL8QJ_XqTCJ",
    "outputId": "17558d3d-ac51-443d-d7f4-08c0a5d7f016"
   },
   "outputs": [],
   "source": [
    "# Checking missing values\n",
    "\n",
    "wine_data.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8SnzW2ibsmYp",
    "outputId": "41953dab-5aa4-4470-e412-7442db0bf2c8"
   },
   "outputs": [],
   "source": [
    "# wine quality range from 0 to 10. The higher the quality value, the good wine is\n",
    "\n",
    "wine_data['quality'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 484
    },
    "id": "ij8yGjqnt2HD",
    "outputId": "9a237e6a-69da-4937-d612-3b8e01cf3309"
   },
   "outputs": [],
   "source": [
    "wine_data.groupby(['fixed acidity', 'volatile acidity', 'citric acid']).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 484
    },
    "id": "8hl4EQpNt680",
    "outputId": "d637c1ad-f4d8-4f38-e5ba-f63a24ccdd47"
   },
   "outputs": [],
   "source": [
    "wine_data.groupby(['free sulfur dioxide', 'total sulfur dioxide']).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the end of the lab that was about using Pandas to manipulate data. Alot of things will make sense when we start to prepare data for machine learning models in the next notebooks. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name='0'></a>\n",
    "\n",
    "### [BACK TO TOP](#0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "2. Data Analysis & Manipulation with Pandas.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
